2,479 research outputs found

    AliClu - Temporal sequence alignment for clustering longitudinal clinical data

    Get PDF
    The authors acknowledge funding the Portuguese Foundation for Science and Technology (Fundação para a Ciência e a Tecnologia - FCT) under contracts INESC-ID (UID/CEC/50021/2019) and IT (UID/EEA/50008/2019), projects PREDICT (PTDC/CCI-CIF/29877/2017), PERSEIDS (PTDC/EMS-SIS/0642/2014) and NEUROCLINOMICS2 (PTDC/EEI-SII/1937/2014). The funders had no role in the design of the study, collection, analysis and interpretation of data, or writing the manuscript.BACKGROUND: Patient stratification is a critical task in clinical decision making since it can allow physicians to choose treatments in a personalized way. Given the increasing availability of electronic medical records (EMRs) with longitudinal data, one crucial problem is how to efficiently cluster the patients based on the temporal information from medical appointments. In this work, we propose applying the Temporal Needleman-Wunsch (TNW) algorithm to align discrete sequences with the transition time information between symbols. These symbols may correspond to a patient's current therapy, their overall health status, or any other discrete state. The transition time information represents the duration of each of those states. The obtained TNW pairwise scores are then used to perform hierarchical clustering. To find the best number of clusters and assess their stability, a resampling technique is applied. RESULTS: We propose the AliClu, a novel tool for clustering temporal clinical data based on the TNW algorithm coupled with clustering validity assessments through bootstrapping. The AliClu was applied for the analysis of the rheumatoid arthritis EMRs obtained from the Portuguese database of rheumatologic patient visits (Reuma.pt). In particular, the AliClu was used for the analysis of therapy switches, which were coded as letters corresponding to biologic drugs and included their durations before each change occurred. The obtained optimized clusters allow one to stratify the patients based on their temporal therapy profiles and to support the identification of common features for those groups. CONCLUSIONS: The AliClu is a promising computational strategy to analyse longitudinal patient data by providing validated clusters and by unravelling the patterns that exist in clinical outcomes. Patient stratification is performed in an automatic or semi-automatic way, allowing one to tune the alignment, clustering, and validation parameters. The AliClu is freely available at https://github.com/sysbiomed/AliClu.publishersversionpublishe

    Discriminative learning of Bayesian networks via factorized conditional log-likelihood

    Get PDF
    We propose an efficient and parameter-free scoring criterion, the factorized conditional log-likelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional log-likelihood criterion. The approximation is devised in order to guarantee decomposability over the network structure, as well as efficient estimation of the optimal parameters, achieving the same time and space complexity as the traditional log-likelihood scoring criterion. The resulting criterion has an information-theoretic interpretation based on interaction information, which exhibits its discriminative nature. To evaluate the performance of the proposed criterion, we present an empirical comparison with state-of-the-art classifiers. Results on a large suite of benchmark data sets from the UCI repository show that ˆfCLL-trained classifiers achieve at least as good accuracy as the best compared classifiers, using significantly less computational resources.Peer reviewe

    Bivariate Extreme Statistics, II

    Get PDF
    We review the current state of statistical modeling of asymptotically independent data. Our discussion includes necessary and sufficient conditions for asymptotic independence, results on the asymptotic independence of statistics of interest, estimation and inference issues, joint tail modeling, and conditional approaches. For each of these topics we give an account of existing approaches and relevant methods for data analysis and applications

    Joint models for longitudinal and survival analysis

    Get PDF
    Funding Information: This study was supported by the Portuguese Foundation for Science and Technology (Fundação para a Ciência e Tecnologia) through the Instituto de Telecomunicações (UIDB/50008/2020), INESC-ID (UIDB/50021/2020), and projects MATISSE (DSAIPA/DS/0026/2019) and PREDICT (PTDC/CCI-CIF/29877/2017). This work has received funding from the European Union's Horizon 2020 Research and Innovation Programme under grant agreement no. 951970 (OLISSIPO project). We also acknowledge Sociedade Portuguesa de Reumatologia and all Reuma.pt contributors.Background: Rheumatic diseases are one of the most common chronic diseases worldwide. Among them, spondyloarthritis (SpA) is a group of highly debilitating diseases, with an early onset age, which significantly impacts patients' quality of life, health care systems, and society in general. Recent treatment options consist of using biologic therapies, and establishing the most beneficial option according to the patients' characteristics is a challenge that needs to be overcome. Meanwhile, the emerging availability of electronic medical records has made necessary the development of methods that can extract insightful information while handling all the challenges of dealing with complex, real-world data. Objective: The aim of this study was to achieve a better understanding of SpA patients' therapy responses and identify the predictors that affect them, thereby enabling the prognosis of therapy success or failure. Methods: A data mining approach based on joint models for the survival analysis of the biologic therapy failure is proposed, which considers the information of both baseline and time-varying variables extracted from the electronic medical records of SpA patients from the database, Reuma.pt. Results: Our results show that being a male, starting biologic therapy at an older age, having a larger time interval between disease start and initiation of the first biologic drug, and being human leukocyte antigen (HLA)-B27 positive are indicators of a good prognosis for the biological drug survival; meanwhile, having disease onset or biologic therapy initiation occur in more recent years, a larger number of education years, and higher values of C-reactive protein or Bath Ankylosing Spondylitis Functional Index (BASFI) at baseline are all predictors of a greater risk of failure of the first biologic therapy. Conclusions: Among this Portuguese subpopulation of SpA patients, those who were male, HLA-B27 positive, and with a later biologic therapy starting date or a larger time interval between disease start and initiation of the first biologic therapy showed longer therapy adherence. Joint models proved to be a valuable tool for the analysis of electronic medical records in the field of rheumatic diseases and may allow for the identification of potential predictors of biologic therapy failure.publishersversionpublishe

    A Pipeline for Clustering by Compression with Application to Patient Stratification in Spondyloarthritis

    Get PDF
    Funding Information: The authors acknowledge Fundação para a Ciência e Tecnologia, LASIGE Research Unit, ref. UIDB/00408/2020 and ref. UIDP/00408/2020 and Instituto de Telecomunicações Research Unit, ref. UIDB/50008/2020, and UIDP/50008/2020. The authors also acknowledge the Project PREDICT (PTDC/CCI-CIF/29877/2017), funded by Fundo Europeu de Desenvolvimento Regional (FEDER), through Programa Operacional Regional LISBOA (LISBOA2020), and by national funds, through Fundacção para a Ciência e Tecnologia (FCT), and projects MATISSE (DSAIPA/DS/0026/2019), MONET (PTDC/CCI-BIO/4180/2020) and SmartGlauco (PTDC/CTM-REF/2679/2020). Publisher Copyright: © 2023 by the authors.The normalized compression distance (NCD) is a similarity measure between a pair of finite objects based on compression. Clustering methods usually use distances (e.g., Euclidean distance, Manhattan distance) to measure the similarity between objects. The NCD is yet another distance with particular characteristics that can be used to build the starting distance matrix for methods such as hierarchical clustering or K-medoids. In this work, we propose Zgli, a novel Python module that enables the user to compute the NCD between files inside a given folder. Inspired by the CompLearn Linux command line tool, this module iterates on it by providing new text file compressors, a new compression-by-column option for tabular data, such as CSV files, and an encoder for small files made up of categorical data. Our results demonstrate that compression by column can yield better results than previous methods in the literature when clustering tabular data. Additionally, the categorical encoder shows that it can augment categorical data, allowing the use of the NCD for new data types. One of the advantages is that using this new feature does not require knowledge or context of the data. Furthermore, the fact that the new proposed module is written in Python, one of the most popular programming languages for machine learning, potentiates its use by developers to tackle problems with a new approach based on compression. This pipeline was tested in clinical data and proved a promising computational strategy by providing patient stratification via clusters aiding in precision medicine.publishersversionpublishe

    Pattern matching through Chaos Game Representation: bridging numerical and discrete data structures for biological sequence analysis

    Get PDF
    This work was partially supported by FCT through the PIDDAC Program funds (INESC-ID multiannual funding) and under grant PEst-OE/EEI/LA0008/2011 (IT multiannual funding). In addition, it was also partially funded by projects HIVCONTROL (PTDC/EEA-CRO/100128/2008, S. Vinga, PI), TAGS (PTDC/EIA-EIA/112283/2009) and NEUROCLINOMICS (PTDC/EIA-EIA/111239/2009) from FCT (Portugal).Background: Chaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2(-L) distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations. Results: The exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm. Conclusions: The analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems.publishersversionpublishe

    One-Pot Aqueous Synthesis of Fluorescent Ag-In-Zn-S Quantum Dot/Polymer Bioconjugates for Multiplex Optical Bioimaging of Glioblastoma Cells

    Get PDF
    Cancer research has experienced astonishing advances recently, but cancer remains a major threat because it is one of the leading causes of death worldwide. Glioblastoma (GBM) is the most malignant brain tumor, where the early diagnosis is vital for longer survival. Thus, this study reports the synthesis of novel water-dispersible ternary AgInS2 (AIS) and quaternary AgInS2-ZnS (ZAIS) fluorescent quantum dots using carboxymethylcellulose (CMC) as ligand for multiplexed bioimaging of malignant glioma cells (U-87 MG). Firstly, AgInS2 core was prepared using a one-pot aqueous synthesis stabilized by CMC at room temperature and physiological pH. Then, an outer layer of ZnS was grown and thermally annealed to improve their optical properties and split the emission range, leading to core-shell alloyed nanostructures. Their physicochemical and optical properties were characterized, demonstrating that luminescent monodispersed AIS and ZAIS QDs were produced with average sizes of 2.2 nm and 4.3 nm, respectively. Moreover, the results evidenced that they were cytocompatible using in vitro cell viability assays towards human embryonic kidney cell line (HEK 293T) and U-87 MG cells. These AIS and ZAIS successfully behaved as fluorescent nanoprobes (red and green, resp.) allowing multiplexed bioimaging and biolabeling of costained glioma cells using confocal microscopy

    What is Science made of

    Get PDF
    Science education, which deals with the sharing and communication of science contents, processes and results with people not belonging to the scientific communities, is being increasingly considered a matter of national concern and a priority in the educational agendas of several countries. Promoting and enhancing scientific literacy of citizens is currently a major mission to modern societies, as it is believed that it will contribute to promote and train for a better informed, as well as a more conscious, critical and committed citizenship (Fiolhais, 2011). But the education for this scientific literacy is also a great challenge as target individuals may be very diverse, either in terms of age or in what concerns knowledge backgrounds, and can range from children to adults within the general public. The language, the format and the way this communication/ education is made should obviously take into consideration the public profiles to which is addressed, being as clear, accurate and demystified as possible. Also, several studies have geared to the view that scientific literacy is best taught by seeing science education as “education through science” as opposed to “science through education.” (Holbrook and Rannikmae, 2009) Although science and technology are ubiquitous in our everyday lives, their role and impacts are not perceived equally by all citizens. For a large part of the population, science, in particular, is still something unknown, often complex, strange and distant — being seen as something intangible for those not directly related to the scientific arena. For others, scientists are still seen as a distracted figure wearing a white tap and glasses fully dedicated to the pursuit of knowledge, though this image is gradually disappearing from the nowadays collective imagination. And what is Science made of? Is it possible to get a clear, precise and instantaneous response? Probably not! Much of what happens in science and in scientific research is still seen by many as pure magic, something transcendental. However it is important to discredit such stereotypes, throwing light on scientists´ work, science tools as well as on the human skills and qualities that make science happens. With this goal, we idealized and prepared an exhibition entitled “What is Science made of", aiming to create a new relationship between science and the general public, in an appealing, innovative and challenging way. "What is Science made of" is a set of twelve images, each one starting from a common laboratory object. The images represent a group of words expressing concepts and values that ideally mirror the relationship between science and scientists. In this work we justify the choice of the twelve words, describe the design and assembly of the set of images as well as its exhibition and the public receptiveness and reactions to such initiative.info:eu-repo/semantics/publishedVersio
    corecore